以前的视觉语言预训练模型主要构建具有令牌和对象(像素)的多模式输入,然后在它们之间执行交叉模式相互作用。我们认为,只有令牌和对象的输入限制了诸如短语到区域接地之类的高级语义对齐。同时,多层次对齐本质上是一致的,并且能够协同促进表示形式学习。因此,在本文中,我们建议学习视觉预训练(MVPTR)的多级语义一致性。在MVPTR中,我们遵循两种方式的嵌套结构,以引入概念为高级语义。为了简化从多模式多级输入的学习,我们的框架分为两个阶段,第一阶段着重于模式内多级表示学习,第二阶段通过粗粒和细粒度跨模态强化了跨模式的交互语义对齐任务。除了常用的图像文本匹配和掩盖语言模型任务外,我们还引入了第一阶段蒙版概念恢复任务以增强概念表示学习,第二阶段的另外两个任务在第二阶段中,以明确鼓励跨跨层次的多层次对准方式。我们的代码可在https://github.com/junction4nako/mvp_pytorch上找到。
translated by 谷歌翻译
概念图是一种特定类型的知识图表,在语义搜索中发挥着重要作用。现有概念图施工方法通常从正式文本中提取高频繁,粗粒度和时间不变的概念。然而,在实际应用中,有必要以不断发展的方式提取少频繁,细粒度和时变的概念知识并建立分类法。在本文中,我们介绍了在阿里巴巴实施和部署概念图的方法。具体而言,我们提出了一个叫做Alicg的框架,它能够通过对准共识方法,b)用新颖的低资源短语挖掘方法挖掘长尾概念来提取细粒度概念,C)更新图形基于隐式和显式用户行为动态通过概念分布估计方法。我们在阿里巴巴UC浏览器部署了框架。广泛的离线评估以及在线A / B测试证明了我们的方法的功效。
translated by 谷歌翻译
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
translated by 谷歌翻译
在本文中,我们提出了一种基于非线性整数编程(IP)的语法纠错(GEC)的系统组合方法。我们的方法基于错误类型优化了一种小说F计数目标,并结合了多个端到端的GEC系统。所提出的IP方法优化了针对数据中存在的每个语法错误类型的单一最佳系统的选择。结合最先进的独立GEC系统的IP方法的实验表明,组合系统优于所有独立系统。当结合BEA 2019共享任务中的两个最佳参与系统时,它会通过3.61%提高F0.5分数,并达到73.08%的F0.5得分。我们还执行实验以将我们的IP方法与GEC的另一种最先进的系统组合方法进行比较,展示了IP的竞争组合能力。
translated by 谷歌翻译